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1 Introduction 



Over the past decade, a significant amount of progress has been made in the field of hardness of 
approximation via results based on the conjectured hardness of certain forms of the Label Cover 
problem. The Unique Games Conjecture (UGC) of Khot [Kho02] states that it is NP-hard to 
distinguish between nearly satisfiable and almost completely unsatisfiable instances of Unique, or 
1-to-l, Label Cover. Using the UGC as a starting point, we now have optimal inapproximability 
results for Vertex Cover [KR03], Max-Cut [KKMO07], and many other basic constraint satisfaction 
problems (CSP). Indeed, assuming the UGC we have essentially optimal inapproximability results 
for all CSPs [Rag08]. In short, modulo the understanding of Unique Label Cover itself, we have 
an excellent understanding of the (in-)approximability of a wide range of problems. 

Where the UGC's explanatory powers falter is in pinning down the approximability of satisfiable 
CSPs. This means the task of finding a good assignment to a CSP when guaranteed that the CSP 
is fully satisfiable. For example, we know from the work of Hastad [HasOl] that given a fully 
satisfiable 3Sat instance, it is NP-hard to satisfy | + e of the clauses for any e > 0. However 
given a fully satisfiable 1-to-l Label Cover instance, it is completely trivial to find a fully satisfying 
assignment. Thus the UGC can not be used as the starting point for hardness results for satisfiable 
CSPs. Because of this, Khot additionally posed his d-to-1 Conjectures: 

Conjecture 1.1 ([Kho02]). For every integer al>2 and e > 0, there is a label set size q such that 
it is NP-hard to (l,e)-decide the <i-to-l Label Cover problem. 

Here by (c, s)-deciding a CSP we mean the task of determining whether an instance is at least o 
satisfiable or less than s-satisfiable. It is well known (from the Parallel Repetition Theorem [FK94, 
Raz95]) that the conjecture is true if d is allowed to depend on e. The strength of this conjecture, 
therefore, is that it is stated for each fixed d greater than 1. 

The (i-to-1 Conjectures have been used to resolve the approximability of several basic "satisfiable 
CSP" problems. The first result along these lines was due to Dinur, Mossel, and Regev [DMR09] 
who showed that the 2-to-l Conjecture implies that it is NP-hard to C-color a 4-colorable graph 
for any constant C. (They also showed hardness for 3-colorable graphs via another Unique Games 
variant.) O'Donnell and Wu [OW09] showed that assuming the d-to-1 Conjecture for any fixed d 
implies that it is NP-hard to (1, | + e)-approximate instances a certain 3-bit predicate — the 
"Not Two" predicate. This is an optimal result among all 3-bit predicates, since Zwick [Zwi98] 
showed that every satisfiable 3-bit CSP instance can be efficiently |-approximated. In another 
example, Guruswami and Sinop [GS09] have shown that the 2-to-l Conjecture implies that given 
a (/-colorable graph, it is NP-hard to find a (/-coloring in which less than a (| — O(^r)) fraction of 
the edges are monochromatic. This result would be tight up to the O(-) by an algorithm of Frieze 
and Jerrum [FJ97]. It is therefore clear that settling the d-to-1 Conjectures, especially in the most 
basic case of d = 2, is an important open problem. 

Regarding the hardness of the 2-to-l Label Cover problem, the only evidence we have is a 
family of integrality gaps for the canonical SDP relaxation of the problem, in [GKO + 10]. Re- 
garding algorithms for the problem, an important recent line of work beginning in [ABS10] (see 
also [BRS11, GS11, StelO]) has sought subexponential-time algorithms for Unique Label Cover 
and related problems. In particular, Steurer [StelO] has shown that for any constant (3 > and 
label set size, there is an exp(0(re^))-time algorithm which, given a satisfiable 2-to-l Label Cover 
instance, finds an assignment satisfying an exp(— 0(l//3 2 ))-fraction of the constraints. E.g., there is 
a 2°( ra )-time algorithm which (1, so)-approximates 2-to-l Label Cover, where so > is a certain 
universal constant. 
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In light of this, it is interesting not only to seek NP-hardness results for certain approximation 
thresholds, but to additionally seek evidence that nearly full exponential time is required for these 
thresholds. This can done by assuming the Exponential Time Hypothesis (ETH) [IP01] and by 
reducing from the Moshkovitz-Raz Theorem [MR10], which shows a near-linear size reduction 
from 3Sat to the standard Label Cover problem with subconstant soundness. In this work, we 
show reductions from 3Sat to the problem of (l,s + e)-approximating several CSPs, for certain 
values of s and for all e > 0. In fact, though we omit it in our theorem statements, it can be 
checked that all of the reductions in this paper are quasilinear in size for e = e(n) = 
for some /3 > 0. 



i 



(log log n) 



1.1 Our results 

In this paper, we focus on proving NP-hardness for the 2-to-l Label Cover problem. To the best 
of our knowledge, no explicit NP-hardness factor has previously been stated in the literature. 
However it is "folklore" that one can obtain an explicit one for label set sizes 3 & 6 by perform- 
ing the "constraint- variable" reduction on an NP-hardness result for 3-coloring (more precisely, 
Max-3-Colorable-Subgraph). The best known hardness for 3-coloring is due to Guruswami and 
Sinop [GS09], who showed a factor ||-hardness via a somewhat involved gadget reduction from the 
3-query adaptive PCP result of [GLST98]. This yields NP-hardness of (1, || + e)-approximating 
2-to-l Label Cover with label set sizes 3 & 6. It is not known how to take advantage of larger label 
set sizes. On the other hand, for label set sizes 2 &i 4 it is known that satisfying 2-to-l Label Cover 
instances can be found in polynomial time. 

The main result of our paper gives an improved hardness result: 

Theorem 1.2. For all e > 0, (1, || + e)- deciding the 2-to-l Label Cover problem with label set sizes 
3 & 6 is NP -hard. 

By duplicating labels, this result also holds for label set sizes 3k & 6k for any k G N + . 



Let us describe the high-level idea behind our result. The folklore constraint- variable reduction 
from 3-coloring to 2-to-l Label Cover would work just as well if we started from "3-coloring with 
literals" instead. By this we mean the CSP with domain Z3 and constraints of the form u Vi — Vj 7^ c 
(mod 3)". Starting from this CSP — which we call 2NLin(Zj3) — has two benefits: first, it is at 
least as hard as 3-coloring and hence could yield a stronger hardness result; second, it is a bit more 
"symmetrical" for the purposes of designing reductions. We obtain the following hardness result 
for 2NLin(Z 3 ). 

Theorem 1.3. For all e > 0, it is HP-hard to (1, ^ + e)-decide the 2NLin problem. 

As 3-coloring is a special case of 2NLin(Z3), [GS09] also shows that (1, || + e)-deciding 2NLin is 
NP-hard for all e > 0, and to our knowledge this was previously the only hardness known for 
2NLin(Zj3). The best current algorithm achieves an approximation ratio of 0.836 (and does not 
need the instance to be satisfiable) [GW04]. To prove Theorem 1.3, we proceed by designing an 
appropriate "function-in-the-middle" dictator test, as in the recent framework of [OW12]. Although 
the [OW12] framework gives a direct translation of certain types of function-in-the-middle tests into 
hardness results, we cannot employ it in a black-box fashion. Among other reasons, [OW12] assumes 
that the test has "built-in noise", but we cannot afford this as we need our test to have perfect 
completeness. 

Thus, we need a different proof to derive a hardness result from this function-in-the-middle test. 
We first were able to accomplish this by an analysis similar to the Fourier-based proof of 2Lin(Zj2) 
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hardness given in Appendix F of [0W12]. Just as that proof "reveals" that the function-in-the- 
middle 2Lin(Zj2) test can be equivalently thought of as Hastad's 3Lin(Zj2) test composed with the 
3Lin(Z 2 )-to-2Lin(Z 2 ) gadget of [TSSWOO], our proof for the 2NLin(Z 3 ) function-in-the-middle test 
revealed it to be the composition of a function test for a certain four-variable CSP with a gadget. 
We have called the particular four-variable CSP 4-Not-AII-There, or 4NAT for short. Because it 
is a 4-CSP, we are able to prove the following NP-hardness of approximation result for it using a 
classic, Hastad-style Fourier-analytic proof. 

Theorem 1.4. For all e > 0, it is NP-hard to (1, | + e)-decide the 4N AT problem. 

Thus, the final form in which we present our Theorem 1.2 is as a reduction from Label-Cover 
to 4NAT using a function test (yielding Theorem 1.4), followed by a 4NAT-to-2NLin(Zj3) gadget 
(yielding Theorem 1.3), followed by the constraint- variable reduction to 2-to-l Label Cover. Indeed, 
all of the technology needed to carry out this proof was in place for over a decade, but without 
the function-in-the-middle framework of [OW12] it seems that pinpointing the 4NAT predicate as 
a good starting point would have been unlikely. 

1.2 Organization 

We leave to Section 2 most of the definitions, including those of the CSPs we use. The heart of 
the paper is in Section 3, where we give both the 2NLin(Zj3) and 4NAT function tests, explain how 
one is derived from the other, and then perform the Fourier analysis for the 4NAT test. The actual 
hardness proof for 4NAT is presented in Section 4, and it follows mostly the techniques put in place 
by Hastad in [HasOl]. 

2 Preliminaries 

We primarily work with strings x £ 7L^ for some integer K. We write X{ to denote the ith coordinate 
of x. Oftentimes, our strings y £ are "blocked" into K "blocks" of size d. In this case, we 
write y[i] £ 7L\ for the ith block of y, and (?/[«])j £ for the jth coordinate of this block. Define 
the function ir : [dK] —> [K] such that ir(k) = i if k falls in the ith block of size d (e.g., ir(k) = 1 
for 1 < k < d, n{k) = 2 for d + 1 < k < 2d, and so on). 

2.1 Definitions of problems 

An instance X of a constraint satisfaction problem (CSP) is a set of variables V, a set of labels D, and 
a weighted list of constraints on these variables. We assume that the weights of the constraints are 
nonegative and sum to 1. The weights therefore induce a probability distribution on the constraints. 
Given an assignment to the variables f : V —> D, the value of / is the probability that / satisfies a 
constraint drawn from this probability distribution. The optimum of I is the highest value of any 
assignment. We say that an X is s-satisfiable if its optimum is at least s. If it is 1-satisfiable we 
simply call it satisfiable. 

We define a CSP V to be a set of CSP instances. Typically, these instances will have similar 
constraints. We will study the problem of (c, s)- deciding V. This is the problem of determining 
whether an instance of V is at least c-satisfiable or less than s-satisfiable. Related is the problem 
of (c, s)- approximating V, in which one is given a c-satisfiable instance of V and asked to find 
an assignment of value at least s. It is easy to see that (c, s)-deciding V is at least as easy as 
(c, s)-approximating V. Thus, as all our hardness results are for (c, s)-deciding CSPs, we also prove 
hardness for (c, s)-approximating these CSPs. 
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We now state the three CSPs that are the focus of our paper. 

2-NLin(Z3): In this CSP the label set is Z3 and the constraints are of the form 

Vi — Vj ^ a (mod 3), a G 

The special case when each RHS is is the 3-coloring problem. We often drop the (Z3) from this 
notation and simply write 2NLin. The reader may think of the 'N' in 2NLin(Zj3) as standing for 
'N'on-linear, although we prefer to think of it as standing for 'N'early-linear. The reason is that 
when generalizing to moduli q > 3, the techniques in this paper generalize to constraints of the 
form ll Vi — Vj (mod q) G {a, a + 1}" rather than a Vi — v j 7^ a (mod q)" . For the ternary version 
of this constraint, u Vi — Vj + Vk (mod q) G {a, a + 1}", it is folklore 1 that a simple modification of 
Hastad's work [HasOl] yields NP-hardness of (1, -)-approximation. 

4-Not-AII-There: For the 4-Not-AII-There problem, denoted 4NAT, we define 4NAT : %\ -)• {0, 1} 

to have output 1 if and only if at least one of the elements of Z3 is not present among the four 
inputs. The 4NAT CSP has label set D = Z3 and constraints of the form 4NAT(?;i + k\, V2 + &2> ^3 + 
^4 + ^4) = L where the k^s are constants in Z3. 

We additionally define the "Two Pairs" predicate Two Pair : 7L\ — > {0, 1}, which has output 1 
if and only if its input contains two distinct elements of Z3, each appearing twice. Note that an 
input which satisfies TwoPair also satisfies 4NAT. 

d-to-1 Label Cover: An instance of the d-to-1 Label Cover problem is a bipartite graph G = 
(UU V, E), a label set size K, and a ci-to-1 map 7r e : [dK] — > [K] for each edge e G E. The elements 
of U are labeled from the set [K] , and the elements of V are labeled from the set [dK] . A labeling 
f : UUV [dK] satisfies an edge e = (n, v) if 7r e (f(v)) = f(u). Of particular interest is the d = 2 
case, i.e., 2-to-l Label Cover. 

Label Cover serves as the starting point for most NP-hardness of approximation results. We 
use the following theorem of Moshkovitz and Raz: 

Theorem 2.1 ([MR10]). For any e = e(n) > n-°^ there exists K,d < 2 pol y( 1 / e ) such that the 
problem of deciding a 3Sat instance of size n can be Karp-reduced in poly(n) time to the problem 
of (l,e)- deciding d-to-1 Label Cover instance of size n l+0 ^ with label set size K. 

2.2 Gadgets 

A typical way of relating two separate CSPs is by constructing a gadget reduction which translates 
from one to the other. A gadget reduction from CSPi to CSP2 is one which maps any CSPi 
constraint into a weighted set of CSP2 constraints. The CSP2 constraints are over the same set of 
variables as the CSPi constraint, plus some new, auxiliary variables (these auxiliary variables are 
not shared between constraints of CSPi). We require that for every assignment which satisfies the 
CSPi constraint, there is a way to label the auxiliary variables to fully satisfy the CSP2 constraints. 
Furthermore, there is some parameter < 7 < 1 such that for every assignment which does not 
satisfy the CSPi constraint, the optimum labeling to the auxiliary variables will satisfy exactly 7 
fraction of the CSP2 constraints. Such a gadget reduction we call a ^-gadget-reduction from CSPi 
to CSP2- The following proposition is well-known: 

1 Venkatesan Guruswami, Subhash Khot personal communications. 
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Proposition 2.2. Suppose it is NP-hard to (c, s)-decide CSPi. If there exists a 7- gadget-reduction 
from CSPi to CSP2, then it is NP-hard to (c + (1 — 0)7, s + (1 — s)')) -decide CSP2. 

We note that the notation 7-gadget-reduction is similar to a piece of notation employed by 
[TSSWOO], but the two have different (though related) definitions. 

2.3 Fourier analysis on Z 3 

Let oj = e 27 ™/ 3 and set U3 = {u , u 1 , u> 2 } . For a G Z3, consider the Fourier character Xa '■ ^3 - > U3 
defined as Xa(x) = u a ' x . Then it is easy to see that E[x a (x)xp( x )] = l[a = /3], where here and 
throughout x has the uniform probability distribution on Z3 unless otherwise specified.. As a 
result, the Fourier characters form an orthonormal basis for the set of functions / : Z3 — )■ Us under 
the inner product (/, g) = E[/ (x)g(x)]; i.e., 

f = ^2 f( a )x<x, 

where the /(a)'s are complex numbers defined as f{a) = E[/(x)x a (ac)]. F° r Q G ^3> we use the 
notation \a\ to denote ^ aj and #a to denote the number of nonzero coordinates in a. When d is 
clear from context and a G Zg^, define ^3(0) G S3 so that (irs(a))i = (mod 3) (recall the 

notation a[i] from the beginning of this section). 

We have Parseval's identity: for every / : Z3 — )■ Us it holds that X] agZ n l/( a )| 2 = 1- Note 
that this implies that |/(a)| < 1 for all a, as otherwise f(ct) 2 would be greater than 1. A function 
/ : Z3 — > %s is said to be folded if for every x G Z3 and c G Z3, it holds that f{x + c) = /(x) + c, 
where (x + c)j = Xi + c. 

Proposition 2.3. Zei / : -)• U 3 be folded. Then f(a) ^ =>- \a\ = 1 (mod 3). 
Proo/. 

/(a) = E[/(a; + l) X «(x + l)] = E[uf(x) Xa (x)x Q (l,l,...,l)] = w X a(l, 1, !)/(«)■ 
This means that wx a (l, 1, . . . , 1) must be 1. Expanding this quantity, 

wXa(l,l,...,l) = w 1 -"^ 1 ' 1 '- 1 ) = a; 1 "!"'. 
So, \a\ = 1 (mod 3), as promised. □ 

3 2-to-l hardness 

In this section, we give our hardness result for 2-to-l Label Cover, following the proof outline 
described at the end of Section 1.1. 

Theorem 1.2 (restated). For all e > 0, it is NP-hard to (1, || + e)- decide the 2-to-l Label Cover 
problem. 

First, we state a pair of simple gadget reductions: 
Lemma 3.1. There is a 3 / '4- gadget-reduction from 4NAT to 2NLin. 
Lemma 3.2. There is a 1/2-gadget-reduction from 2NLin to 2-to-l. 
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Together with Proposition 2.2, these imply the following corollary: 

Corollary 3.3. There is a 7 /8- gadget-reduction from 4NAT to 2-to-l. Thus, if it is UP -hard to 
(c, s)-decide the 4NAT problem, then it is NP-hard to ((7 + c)/8, (7 + s)/8)-decide the 2-to-l Label 
Cover problem. 

The gadget reduction from 4NAT to 2NLin relies on the simple fact that if a, b,c,d £ Z3 satisfy the 
4NAT predicate, then there is some element of Z3 that none of them equal. 

Proof of Lemma 3.1. A 4 NAT constraint C on the variables S = (v 1, V2, v%, V4) is of the form 

4NAT(wi + h,V2 + k2,vs + k 3 ,V4 + k 4 ), 

where the k^s are all constants in Z3. To create the 2NLin instance, introduce the auxiliary variable 
yc and add the four 2NLin equations 

Vi + h^yc (mod 3), i € [4]. (1) 

If / : S — > Z3 is an assignment which satisfies the 4NAT constraint, then there is some a 6 Z3 
such that f(vi) + fc, / a (mod 3) for all i G [4]. Assigning a to yc satisfies all four equations (1). 
On the other hand, if / doesn't satisfy the 4NAT constraint, then {f(vi) + /n}«e[4] = Z3, so no 
assignment to yc satisfies all four equations. However, it is easy to see that there is an assignment 
which satisfies three of the equations. This gives a |-gadget-reduction from 4NAT to 2NLin, which 
proves the lemma. □ 

The reduction from 2NLin to 2-to-l Label Cover is the well-known constraint- variable reduction, 
and uses the fact that in the equation v i — Vj 7^ a (mod 3) , for any assignment to v j there are two 
valid assignments to Vi, and vice versa. 

Proof of Lemma 3.2. An 2NLin constraint C on the variables S = (^1,^2) is of the form 

v\ — V2 ^ a (mod 3), 

for some a 6 Z3. To create the 2-to-l Label Cover instance, introduce the variable yc which will 
be labeled by one of the six possible functions g : S — > Z3 which satisfies C. Finally, introduce the 
2-to-l constraints yc(v\) = f(v\) and yc{v2) = 7(^2) • 

If / : S — > Z3 is an assignment which satisfies the 2NLin constraint, then we label yc with /. 
In this case, 

yc{vi) = f(vi), i = l,2. 

Thus, both equations are satisfied. On the other hand, if / does not satisfy the 2NLin constraint, 
then any g which yc is labeled with disagrees with / on at least one of v\ or x>2- It is easy to see, 
though, that a g can be selected to satisfy one of the two equations. This gives a ^-gadget-reduction 
from 2NLin to 2-to-l, which proves the lemma. □ 

3.1 A pair of tests 

Now that we have shown that 2NLin hardness results translate into 2-to-l Label Cover hardness 
results, we present our 2NLin function test. Even though we don't directly use it, it helps explain 
how we were led to consider the 4NAT CSP. Furthermore, the Fourier analysis that we eventually 
use for the 4NAT Test could instead be performed directly on the 2NLin Test without any direct 
reference to the 4NAT predicate. The test is: 
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2NLin Test 



Given folded functions f : -t Z 3 , g,h : Zf K 



Let x S TL^ and y E 7U^ be independent and uniformly random. 

For each % £ [K],j € [d], select (z[i])j independently and uniformly from the elements of 
Z 3 \ {x h (y\i])j}. 

With probability I, test f(x) / h(z); with probability §, test g(f/) 7^ 



x 



UD 



LXI 



LXI 



2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 21012 1 1 1 1 1 



11210 



21111 



01211 



01210 



21110 



Figure 1: An illustration of the 2NLin test distribution; d = 3, K = 5 



Above is an illustration of the test. We remark that for any given block i, z[i] determines x% 
(with very high probability), because as soon as z[i] contains two distinct elements of Z3, Xj must 
be the third element of Z3. Notice also that in every column of indices, the input to h always 
differs from the inputs to both / and g. Thus, "matching dictator" assignments pass the test with 
probability 1. (This is the case in which f(x) = Xi and g{y) = (y[i\)j for some i G [K], j S [d].) On 
the other hand, if / and g are "nonmatching dictators" , then they succeed with only probability. 
This turns out to be essentially optimal among functions / and g without "matching influential 
coordinates/blocks". We will obtain the following theorem: 

Theorem 1.3 restated. For all e > 0, it is HP-hard to (1, + e)-decide the 2NLin problem. 

Before proving this, let us further discuss the 2NLin test. Given x, y, and z from the 2NLin test, 
consider the following method of generating two additional strings y', y" G Tj^ k which represent fo's 
"uncertainty" about y. For j £ [d], if a;, = (y[i])j, then set both (y'[i])j and (y"[i])j to the lone ele- 
ment of Tjz\{xi, {z[i\)j}. Otherwise, set one of (y'[i])j or (y"[i])j to Xi, and the other one to (y[i])j- 
It can be checked that TwoPair(ajj, (y[i])j, (y'[i])j, (y"[i])j) = 1, a more stringent requirement than 
satisfying 4NAT. In fact, the marginal distribution on these four variables is a uniformly random 
assignment that satisfies the TwoPair predicate. Conditioned on x and z, the distribution on y' and 
y" is identical to the distribution on y. To see this, first note that by construction, neither (y'[i])j 
nor (y"[i])j ever equals (z[i])j. Further, because these indices are distributed as uniformly random 
satisfying assignments to TwoPair, Pr[(y'[i])j = Xi] = Pr[(y"[i])j = X{] = ^, which matches the 
corresponding probability for y. Thus, as y, y' , and y" are distributed identically, we may rewrite 
the test's success probability as: 

Pr[/, g, and h pass the test] = \ Pr[/(aj) + h{z)} + | Pr[g(y) ± h(z)} 

Pr[f(x)^h(z)}, 
Pr[g(y) + h(z)}, 
Pr[g(y>) ± h(z)}, 
Pr[g(y") ± h{z)\ 

<\ + \ B[4NAT(f(x),g(y),g(y>),g(y"))]. 



ave 
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This is because if 4NAT fails to hold on the tuple (f(x),g(y),g(y'),g(y")), then h(z) can disagree 
with at most 3 of them. 

At this point, we have removed h from the test analysis and have uncovered what appears to 
be a hidden 4NAT test inside the 2NLin Test: simply generate four strings x, y, y' , and y" as 
described earlier, and test 4NAT(/(a;), g(y), g(y'), g(y"))- With some renaming of variables, this is 
exactly what our 4NAT Test does: 

4 NAT Test 



Given folded functions / : -> Z 3 , g : %% K -»• Z 3 : 

• Let x 6 7ZJ^ be uniformly random. 

• Select y,z,w as follows: for each i £ [K],j £ [d], select ((y[i])j, (z[i])j, (w[i])j) uniformly at 
random from the elements of Z3 satisfying TwoPair(a;j, (y[i])j, (w[i])j). 

. Test AHAT(f(x),g(y),g(z),g(w)). 



x nn m m m m 



1/ 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 1 01210 1 2 1 1 1 



* 1 1 2 1 2 1 1 2 1 2 1 1 1 1 1 1 2 1 01210 1 1 1 1 2 



w I 1 I I 2 I I 1 I 2 I I I 2 I 1 I 1 I 1 I 1 I 1 I I 2 I 



Figure 2: An illustration of the 4 NAT test distribution; d = 3, K = 5 



Above is an illustration of this test. In this illustration, the strings z and w were derived 
from the strings in Figure 1 using the process detailed above for generating y' and y" . Note that 
each column is missing one of the elements of and that each column satisfies the TwoPair 
predicate. Because satisfying TwoPair implies satisfying 4NAT, matching dictators pass this test 
with probability 1. On the other hand, it can be seen that nonmatching dictators pass the test with 
probability |. In the next section we show that this is optimal among functions / and g without 
"matching influential coordinates/blocks" . 

(As one additional remark, our 2NLin Test is basically the composition of the 4NAT Test with 
the gadget from Lemma 3.1. In this test, if we instead performed the f(x) ^ h(z) test with 
probability g and the g(y) 7^ h(z) test with probability |, then the resulting test would basically 
be the composition of a 3NLin test with a suitable 3NLin-to-2NLin gadget.) 



3.2 Analysis of 4 NAT Test 

Let u = e 2 ™/ 3 , and set U 3 = {oj° , uj 1 ,^ 2 }. In what follows, we identify / and g with the functions 
ujf and co 9 , respectively, whose range is U3 rather than Z3. Set L = dK. The remainder of this 
section is devoted to the proof of the following lemma: 



8 



Lemma 3.4. Let f : %>g U 3 and g : %% K -> f7 3 . T/ien 

E^n/C^^Cy)^^)^^))] < 1 + I E l/Ma))| • l5(«)| 2 • (1/2) #Q 

The first step is to "arithmetize" the 4NAT predicate. It is not hard to verify that 
4NAT(oi, 02,03,04) = - + -^u^uf* ~ I £ " ai " 0j w 0fc 

i^j i<j<k i<j<k 

= l + lYl K K^ aj l - § E tt[w a *w < yw a *]. 

i<j i<j<k 

Using the symmetry between y, z, and w, we deduce 

E[WAT(f(x),g(y),g(z),g(w))} 

= § + |»E[/(x)^)] + |3?E[ ff (y)^)] - |KE[/(x) & (y) ff (z)] - |»E[ & (y)< 7 (z) ff («;)]. (2) 

In the second term in the RHS of (2) we in fact have ~E[f(x)g(y)] = 0. This is because x and y are 
independent, and hence E[f(x)g(y)] = E[/(aj)] E[g(y)] =0-0 since / and g are folded. Regarding 
the third term of the RHS in (2), this also turns out to be by virtue of g being folded. This can 
be proven using a Fourier-analytic argument; we present here an alternate combinatorial argument: 

Lemma 3.5. F,[g(y)g(z)} = 0. 

Proof. Fix any value y G Z3 for y. Consider the function t : Zf- x Z3 — > Tif x Z3 defined as 
t(x,z) = (x + l,z — 1), where all arithmetic is performed modulo 3. Note that t has order 3, 
meaning that t(t(t(x, z))) = (x,z). This allows us to group values for x and z into sets of size 
three as follows: put (x,z) G x Z3 into the set T(x,z) = {(x, z), t(x, z), t(t(x, z))}. Because t 
is invertible and of order 3, each pair (x, z) is a member of only one set: T(x, z). 

Conditioned on y = y, if (x, z) is in the support of the test, then all (a/, z') G T(x, z) are also in 
the support of the test. This is because the strings which are in the support of the test are exactly 
the strings x and z for which the set {(x n )i, yi, z{\ C Z3 is of size 2, for all i G [L\. These strings, 
in turn, are exactly those for which x n + y + z ^ (mod 3). But if (x 1 , z') = t(x, z), then 

x' % + y + z' = (x n + 1) + y + (z - 1) = x n + y + z ^ (mod 3). 

This shows that t(x, z) is in the support of the test, conditioned on y = y. As T(x' , z') = T(x, z), 
the same holds for t(t(x, z)). 

When conditioned on y = y, each pair (x, z) in the support of the test occurs with equal 
probability. To see this, first note that x is pairwise independent from y. In other words, any value 
x for x is equally likely, regardless of y. Then, conditioned on x = x and y = y, there are exactly 
two possibilities for each index of z, both of which occur with half probability. Thus, the event 
(x, z) occurs with the same probability, no matter the values of x or z. 

Consider an arbitrary set T(x, z). Conditioned on (x, z) falling in T{x, z), the value of (x, z) is 
a uniformly random element of this set. This means that z is equally likely to be z, z — 1, or z — 2. 
By the folding of g, g(z) is therefore equally likely to be one of a; ,^ 1 , or u: 2 . As this happens 
for any choice of the set T(x,z), g(z) is uniform on U3, even when conditioned on y = y. Thus, 
E[g(y)g(z)] = as desired. □ 
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Equation (2) has now been reduced to 

(2) = § - lRE[f(x)g(y)g(z)] - l$tE[g(y)g(z)g(w)}. (3) 

As g(y)g(z)g(w) is always in U3, !R E[g(y)g(z)g(w)] is always at least — i. Therefore, 

(3) < § - ^B[f(x)g(y)g(z)]. (4) 

It remains to handle the 'E[f(x)g(y)g(z)] term, which is the subject of our next lemma. This is 
done through a standard argument in the style of Hastad [HasOl]. 

Lemma 3.6. E[f(x)g(y)g(z)] = £ Qe ^ f(n 3 (a))g(af {-\)* a . 
Proof. Begin by expanding out E[f {x)g(y)g(z)\. 

B[f(x)g(y)g(z)] = £ /(a)$(/3)$( 7 )E[x a (x)x0(y)x 7 (z)]. ( 5 ) 

N = W\=h\ = i (mod 3) 

We focus on the products of the Fourier characters: 



V[Xa(x)xp{y)x~f(z)} = n E [W^)X/3[i](yH)x 7 [ 4 ](^])] 

ie[K] 



(6) 



We can attend to each block separately: 

E[% ai (a;i)X/3[i](yH)x 7 [i](«W)] =E 

= E 



CO 



ai-Wi+/3[i] -y[i] +j[i]-z[i] 



n 



E 

j:n(j)=is 



X j 



(*) 



(7) 



Now, consider the expectation (*). The distribution on the values for (yj, Zj) is uniform on the 
six possibilities (a + 1, a + 1), (a + 2, a + 2), (a, a + 1), (a, a + 2), (a + 1, a), and (a + 2, a). We 
claim that (*) is nonzero if and only if j3j = 7^ (mod 3). If, on the other hand, f3j ^ 7j (mod 3), 
then either only one of j3j or is zero, or neither is zero, and —(3j = jj (mod 3). In the first case, 
the expectation is either E[uA y J | Xi = a] or Efa^ 2 ^ | Xi = a] for a nonzero (3j or a nonzero 7,-, 
respectively. Both of these expectations are zero, as both yj and Zj are uniform on Z3. In the 
second case, 

E[a/^ +7 ^ I Xi = a] = E[uh y i-h z * \ x { = a] 
= -E\^Vi-*i) I Xi = a], 

which is zero, because /3j is nonzero, and — Zj is uniformly distributed on Z3. 

Thus, when (*) and Equation (6) are nonzero, (3 = 7 (mod 3). This means that (*) = 
E[u/ J ( y J +Zj ) \ Xi = a}. When /3j = 0, this is clearly 1. Otherwise, as either y^ + Zj = 2a + 1 
(mod 3) or y^ + Zj = 2a + 2 (mod 3), each with probability half, this is equal to 



(*) 



CO 



+ C0 



/%(2a+2) 



CO^l , 1 2 N ^ 
— — (w 1 + OJ 2 ) = 
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In summary, when j3 = 7, (*) = (—5) 
We can now rewrite Equation (7) as 

(7) = E 



co a '- a H 

j:n(j)=i 



1 \ #/% 



E 

X 



1 \ #ffl 
V 



A ai +2\f3\i]\)a 



Note that the exponent of uj, (a, + 2|/3[i]|)a, is zero if on = \f3[i]\ (mod 3), in which case the 
expectation is just the constant (— l/2)* /3 W. This occurs for all i £ [K] exactly when a = vr3(/3). 
If, on the other hand, ai + 2 1 /5 [z] | is nonzero, then the entire expectation is zero because a, the 
value of Xi, is uniformly random from Z3. Thus, Equation (6) is nonzero only when a = 7r3(/3) and 
P = 7, in which case it equals 

■r 



(6) 



We may therefore conclude with 

(5) = £ /Ma))£(a) 

Substituting this result into (4) yields 

E[4NAT(f(x), g(y),g(z),g(w))] /(vr 3 (a))5(a) 



□ 



<l + l E l/(vr 3 (a))|-|5(«)| 2 -(l/2)# a , 



completing the proof of Lemma 3.4. 



4 Hardness of 4NAT 

In this section, we show the following theorem: 

Theorem 1.4 (detailed). For all e > 0, it is UP-hard to (1, | + e)-decide the 4 NAT problem. In 
fact, in the "yes case", all 4NAT constraints can be satisfied by TwoPair assignments. 

Combining this with Lemma 3.1 yields Theorem 1.3, and combining this with Corollary 3.3 
yields Theorem 1.2. It is not clear whether this gives optimal hardness assuming perfect complete- 
ness. The 4NAT predicate is satisfied by a uniformly random input with probability |, and by the 
method of conditional expectation this gives a deterministic algorithm which (1, |)-approximates 
the 4NAT CSP. This leaves a gap of | in the soundness, and to our knowledge there are no better 
known algorithms. 

On the hardness side, consider a uniformly random satisfying assignment to the TwoPair predi- 
cate. It is easy to see that each of the four variables is assigned a uniformly random value from Z3, 
and also that the variables are pairwise independent. As any satisfying assignment to the TwoPair 
predicate also satisfies the 4NAT predicate, the work of Austrin and Mossel [AM09] immediately 
implies that (1 — e, | + e)-approximating the 4NAT problem is NP-hard under the Unique Games 
conjecture. Thus, if we are willing to sacrifice a small amount in the completeness, we can improve 
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the soundness parameter in Theorem 1.4. Whether we can improve upon the soundness without 
sacrificing perfect completeness is open. 

We now arrive at the proof of Theorem 1.4. The proof is entirely standard, and proceeds by 
reduction from d-to-1 Label Cover. It makes use of our analysis of the 4NAT Test, which is presented 
in Appendix 3.2. One preparatory note: most of the proof concerns functions / : TL^ — > and 
g : 7L^ K — > Z3. However, we also be making use of Fourier analytic notions defined in Section 2.3, 
and this requires dealing with functions whose range is U3 rather than Z3. Thus, we associate / 
and g with the functions lo^ and lo 9 , and whenever Fourier analysis is used it will actually be with 
respect to the latter two functions. 

Proof. Let G = (U U V, E) be a d-to-1 Label Cover instance with alphabet size K and d-to-1 maps 
7r e : [dK] — > [K] for each edge e G E. We construct a 4NAT instance by replacing each vertex 
in G with its Long Code and placing constraints on adjacent Long Codes corresponding to the 
tests made in the 4 NAT Test. Thus, each u G U is replaced by a copy of the hypercube TL% and 
labeled by the function f u : "Zf — > 7L%. Similarly, each v G V is replaced by a copy of the Boolean 
hypercube "Zf K and labeled by the function g v : 7ti^ — > Z3. Finally, for each edge {u,v} G E, a 
set of 4NAT constraints is placed between f u and g v corresponding to the constraints made in the 
4NAT Test, and given a weight equal to the probability the constraint is tested in the 4NAT Test 
multiplied by the weight of {u, v} in G. This produces a 4NAT instance whose weights sum to 1 
which is equivalent to the following test: 

• Pick an edge e = (u, v) G E uniformly at random. 

• Reorder the indices of g v so that the kth group of d indices corresponds to TT~ 1 (k). 

• Run the 4NAT test on f u and g v . Accept iff it does. 



Completeness If the original Label Cover instance is fully satisfiable, then there is a function 
F : U U V — > [dK] for which val(F) = 1. Set each f u to the dictator assignment f u (x) = x F ^ 
and each g v to the dictator assignment g v (y) = Vf(v)- Let e = {u, v} £ E. Because F satisfies the 
constraint 7r e , F(u) = ir e (F(v)). Thus, f u and g v correspond to "matching dictator" assignments, 
and above we saw that matching dictators pass the 4NAT Test with probability 1. As this applies 
to every edge in E, the 4NAT instance is fully satisfiable. 

Soundness Assume that there are functions {f u } u &u and {g v }vev which satisfy at least a | + e 
fraction of the 4NAT constraints. Then there is at least an e/2 fraction of the edges e = {u, v} G E 
for which f u and g v pass the 4NAT Test with probability at least | +e/2. This is because otherwise 
the fraction of 4NAT constraint satisfied would be at most 

i e\ (2 e\ e . . 2 2e e 2 2 
X -2) U + 2-J + 2 (1) = 3 + 3-4 < 3 +e ' 

Let E f be the set of such edges, and consider {zi, v} G E f . Set L = dK. By Lemma 3.4, 



2t 2 2 

+ -z < Pr[/„ and g v pass the 4NAT test] < - + - I |/«(7r 3 (a)) \g v (a)\ 



3 2 ■" ' J ~ 3 3 

meaning that 



2 



3e 

T 



< ^ |/ M (vr 3 (a))||^(a)| 2 (^) #a . (8) 
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Parseval's equation tells us that Ylae%% l&> ( a )l 2 = 1- The function therefore induces a probability 
distribution on the elements of 7L\ . As a result, we can rewrite Equation (8) as 



36 < 
4 - 



E 



/«(7T 3 (a)) 



(9) 



As previously noted, \f u (iT3(a))\ is less than 1 for all a, so the expression in this expectation as 
never greater than 1. We can thus conclude that 



3e 



< Pr 



/«(vr 3 (a)) 



> 



3f 



G00D o 



as otherwise the expectation in Equation (9) would be less than 3e/4. Call the event in the 
probability GOOD Q . When GOOD a occurs, the following happens: 

• |A(vr 3 («))| 2 > 96 2 /64. 

• #« < log 2 (8/3e). Furthermore, as f u is folded, #a > 0. 

This suggests the following randomized decoding procedure for each u G U: pick an element 
/3 G TL^ with probability |/ u (/3)| 2 and choose one of its nonzero coordinates uniformly at random. 
Similarly, for each v £ V, pick an element a £ with probability |<?t,(a)| 2 and choose one of its 
nonzero coordinates uniformly at random. In both cases, nonzero coordinates are guaranteed to 
exist because all the / n 's and g v 's are folded. 

Now we analyze how well this decoding scheme performs for the edges e = {u, v} G E' (we 
may assume the other edges are unsatisfied). Suppose that when the elements of and were 
randomly chosen, <jr„'s set a was in Good a , and / u 's set (3 equals ^(a). Then, as j^a < log 2 (8/3e), 
and each label in TT^(a) has at least one label in a which maps to it, the probability that matching 
labels are drawn is at least 1/ log 2 (8/3e). Next, the probability that such an a and j3 are drawn is 



\U(*M)\ 2 \9v[ci. 

iSGOOD 



1 - 64 



ieGOOD 



> 



9e 2 3e 27e 3 



64 



512 



Combining these, the probability that this edge is satisfied is at least 27e 3 /512 log 2 (8/3e) 
the decoding scheme satisfies at least 



Thus, 



27e d 



5121og 2 (8/3e) 



1^1 

w 



> 



27e 4 



1024 log 2 (8/3e) 



fraction of the Label Cover edges in expectation. By the probabilistic method, an assignment to 
the Label Cover instance must therefore exist which satisfies at least this fraction of the edges. 

We now apply Theorem 2.1, setting the soundness value in that theorem equal to 0(e 5 ), which 
concludes the proof. □ 
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